Modeling the impact of COVID-19 on future tuberculosis burden

Background The ongoing COVID-19 pandemic has greatly disrupted our everyday life, forcing the adoption of non-pharmaceutical interventions in many countries and putting public health services and healthcare systems worldwide under stress. These circumstances are leading to unintended effects such as the increase in the burden of other diseases. Methods Here, using a data-driven epidemiological model for tuberculosis (TB) spreading, we describe the expected rise in TB incidence and mortality if COVID-associated changes in TB notification are sustained and attributable entirely to disrupted diagnosis and treatment adherence. Results Our calculations show that the reduction in diagnosis of new TB cases due to the COVID-19 pandemic could result in 228k (CI 187–276) excess deaths in India, 111k (CI 93–134) in Indonesia, 27k (CI 21–33) in Pakistan, and 12k (CI 9–18) in Kenya. Conclusions We show that it is possible to reverse these excess deaths by increasing the pre-covid diagnosis capabilities from 15 to 50% for 2 to 4 years. This would prevent almost all TB-related excess mortality that could be caused by the COVID-19 pandemic if no additional preventative measures are introduced. Our work therefore provides guidelines for mitigating the impact of COVID-19 on tuberculosis epidemic in the years to come.

• Re-infection processes: individuals in the slow latency reservoir can get re-infected, a fraction of which will develop TB fast after re-infection. This is modeled as a transition from L S to L F .
• Development of active TB: infected individuals (either L F or L S ) may develop initially undiagnosed -and thus untreated-TB (D states).
• TB diagnosis: with some delay after the disease onset, TB gets diagnosed and treatment starts (transition from D states to T ones) • Treatment outcomes: (transitions from T states to R ones) different possible outcomes are possible: -either success or failure/default-(failure of treatment moves individuals to F) • Disease relapse: (transitions back from T to R) • Death: active TB patients, either diagnosed or not, are assigned a TB-specific mortality rate.
The specific details about the mathematical parameterization of these and other model transitions are explicitly enumerated and described in the supplementary appendix of Arregui et al. work 1 , and the ODE system that governs the evolution in each compartment is also reported in the next section. Altogether, they define the evolution of individuals with time, which can finally get aggregated across age strata to define global estimates of TB burden.
Within this scheme, infections occur after contact between susceptible individuals and infectious ones. If S(a, t) represents the number of susceptible subjects in the age group a, at a given time t, the number of new infections that will be observed will be equal to the product of S(a, t) and the force of infection perceived by that sub-population, λ(a, t), which represents the fraction of susceptible individuals who get infected per year. In turn, the force of infection is proportional to the following sum: where Υ(a ′ , t) is the density of all the infectious individuals within age-group a ′ at time step t, weighted by their relative infectiousness; and ξ c (a, a ′ , t) represents the relative contact frequency that an individual of age a has with individuals of age a ′ at time t, with respect to the overall average of contacts that an individual has per unit time with anyone else. For the computation of the contact matrices used in our model, we have integrated data from different survey studies conducted in African countries (Kenya 3 , Zimbabwe 4 and Uganda 5 ), to obtain a unique matrix broadly representative of contact structures in Africa 1,6 . Importantly, we also take into account that, as the demographic structure of the population changes, the contact patterns change too 6 , being this the origin of the dependence over time we found in matrices.

Ordinary differential equations system
In the previous section, we included the TB natural history which is used in the model as the compartments of the compartmental model. The evolution of the population in each compartment is given by an ordinal differential equation (ODE) which captures the transitions that either introduce new individuals or withdraw them. The following system of differential equations describes, then, the evolution of the different dynamical states of the model: where δ(a) stands for the Dirac delta function (δ(x = 0) = 1 and δ(x ̸ = 0) = 0). There are three quantities that depend on time: the force of infection λ(a, t), the diagnosis rate d(t) and the correction terms ∆ N (a, t), standing for any demographic variation in the population due to causes foreign to TB and aging.
For further information about the ODE system, and the set of parameters that appear in them, the reader is referred to the supplementary materials of the original source 1 , in which they are described exhaustively. In summary, the parameters appearing here represent either probabilities (for example, probability of developing one kind of TB or another, or success probability of under-treatment individuals) or rates (for instance, the rate of progression from slow and fast latency towards developing active TB) that are reported in the literature and capture the progression of individuals across the states of the model.

Population dynamics across age strata: individuals' ageing and demographic evolution.
This model includes the simultaneous description of the disease dynamics across all age groups in an entire population, using parameters that are, in general, dependent on age. Therefore, it is not enough to describe how the sub-populations associated with the disease states evolve, but the model also includes an aging dynamics whereby individuals transit across the different age strata as they get older. In Supplementary Figure 2 we schematize this phenomenon as the transitions between age strata inside each pyramid and the change in the shape of the demographic pyramid with time.
Two additional ingredients that are key to describing the evolution of the population and the aging are then included in our model. First, we consider empirical data and forecasts to model past and future fertility levels, respectively. Second, we introduce continuous correction terms ∆ N (a, t) that are added or subtracted from the population within the age stratum a at time t while the simulation unfolds.
These terms are calculated dynamically to make the time evolution of the demographic pyramid match the demographic forecasts reported in the United Nations population division database until the end of the simulation. This way, the correction terms ∆ N (a, t) are distributed among all disease states proportionally to their relative size. These terms represent changes in the population of each stratum that are unrelated to the dynamics of the disease (TB unrelated mortality and migratory fluxes).
In a summary, demographic pyramids in Supplementary Figure 2 evolve with time in such a way that individuals transit inside the pyramid as they age, but also we update the whole population in a data-driven way. That is, we update the number of individuals inside each age strata according to the prospects of the UN population division, where the correction terms ∆ N (a, t) are used for this purpose. This assures that both the aging and the demographic prospects are included in the model, capturing the evolution of the population.

Propagation of the uncertainty
The uncertainty sources 1 are summarized here: • Parameters associated with the Natural History of the disease are treated as totally independent uncertainty sources.
• TB burden estimations provided by the WHO. Based upon several case notifications surveilled in each country, the World Health Organization provides estimations for incidence and mortality rates for almost all countries.
• Demographic structures that are reported in the form of demographic pyramids with 15 age groups, the first 14 from 0y to 70y each 5y, and the last one comprises 70+y. Each group has its uncertainty but we modify them as a whole and not as 15 independent uncertainty sources.
• Contact matrices, whose uncertainty comes from the variability between studies used for building regional contact matrices.
Those uncertainty sources are managed independently and their contributions to the uncertainty of a certain model outcome x are to be evaluated. On the original version of the model 1 the uncertainty of an outcome was translated by doing a sensibility analysis that accounts for the variation produced in the outcome when the median value of an uncertainty source is changed by its extreme values and the uncertainty is built by grouping individual sensitivities according to the type of input data for building credibility intervals and significance levels for the outcome.
In this work, thus, this procedure is deprecated and has changed radically, as the old procedure was conservative in the sense that for each uncertainty source the worst-case scenario was assumed always, which is not always a realistic approach to spreading. This approach is replaced, and a probabilistic procedure is implemented. The process performed in the new version of the model is described here: 1. Generate a set of N new values of the uncertainty sources taking into account the original value and its CI. The generation of new values is not trivial as we cannot treat all the uncertainty sources in the same way. Further details of the generation procedure are explained in the next section.
2. Calibrate the model for each new set of parameters.
3. Run the model for each new set of parameters, thus obtaining a set of N values for a certain outcome.
4. The outcome is obtained as the median value and the CI interval as the 95% quantile of the set of outcomes.
Proceeding this way we can generate a CI for any outcome the model can give, assuring that all the uncertainty sources are taken into account. The generation process of the new values for each uncertainty source is described in the following section.

Generation of new values for the uncertainty sources
In general, we assume that the reported confidence intervals of each value can be interpreted as coming from a normal distribution, thus being the reported value the mean and building the standard deviation from the CI, so we have p i ∼ N (µ, σ).
This allows us to generate random numbers under those normal distributions that serve as our new value for the uncertainty source, but caution is needed, as there are several points we must consider. First, some parameters that belong to the interval ∈ [0, ∞), so their normal distribution must be renormalized in such a way that all the probability lies within the [0, ∞) interval. Other parameters represent probabilities, so not only do they lie in the interval [0, 1] but also are bounded to some other parameters. As an example, if we take the treatment success probability and calculate a new value that is higher than the original, then the probability of not succeeding must decrease accordingly.
Here, the random number generation procedure trust in R, as we use the random number generators included in the base package (R 3.4.4).
In the case of contact matrices, we consider each value of the matrix as independent from the others, as each number is obtained through surveys performed in all age groups, and represents the contact they establish with the other age groups. This means that the matrices are not symmetrical in the sense that they do not report the same number of contacts between age group a towards age group a' as contacts from a' to a, so they do not show this constraint. Thus, a new matrix is obtained by taking each M a,a ′ and recalculating it within its distribution. As the number of contacts or the contact rate must lie within the interval ∈ [0, ∞), they cannot be negative.
Moving to the TB burden, there are two magnitudes reported here, incidence and mortality per million population from the year 2000 to the year 2018. As our model calibrates both the infection force and the diagnosis rate based on this data, we can modify incidence and mortality separately. Nevertheless, values inside each block must be changed in the same way, so if the new incidence in the year 2000 is higher than the original, then all the incidence values of the other years must be higher than their previous values too, as we are moving towards a high incidence setting. Then, we generate new values for the TB burden by drawing a Z-score from the normal distribution N (0, 1) and then using it for calculating the new incidence of each year j, thus: where σ j is the standard deviation of the normal distribution of the j-th year. Then this is repeated with a new Z-score for the mortality so both incidence and mortality are modified as a block. Finally, in dealing with demography we decided to act on the shape of the demographic pyramid instead of modifying the global population. This way the change from one year to another is not too abrupt. Let's denote an arbitrary age group as a so a ∈ [1,15]. A new Z-score, namely Z ′ , is drawn from N (0, 1) and a new value for the base of the pyramid (a=1) is calculated by: Then, we modify the rest of the age groups from the base to the top of the pyramid but using a Z-score given by: where M (Z ′ ) = − 1 7 Z ′ and K(Z ′ ) = 8 7 Z ′ . This assures that Z(1) = Z ′ and Z(15) = −Z ′ , which implies that the excess or lack of population in the base groups balance out by the contrary effect in the top of the pyramid, preserving the total population. Consequently, the new value for any group is given by: Up to this point, we have been able to generate a new set of values for all the uncertainty sources described before. Finally, for each one of these sets, the model needs to be calibrated.

SUPPLEMENTARY NOTES 1
Mobility patterns, Covid-19 influence, and modeling decisions: In this study, we opt to introduce the pandemic disruption as a multiplicative factor that either increases or decreases the expected diagnosis value if the pandemic had never happened and evaluate the changes that occur in the forecast. This multiplicative factor, based on WHO and local reports, accounts not only for the lockdowns but for the whole period in which the healthcare system is disrupted, which is, generally, longer than the lockdown itself.
As facemasks and social distancing have been globally introduced as a countermeasure for avoiding Covid-19 transmission, some effect is expected over the transmission of other airborne diseases such as TB. Interestingly enough, if we observe the changes in mobility according to Google's data 7 , along with the Covid-19 confirmed cases and the reduction in the TB notifications in India from Feb 2020 to Oct 2021 8 (see Supplementary Figure 3), we realize that during high-Covid-19 burden periods, where more strict measures are implemented, presence in households and Grocery & pharmacy places increases. Moreover, changes in mobility patterns suggest that, during several phases of the pandemic, measures forced individuals to interact in closed spaces.
From Supplementary Figure 3 we also learn that those high-incidence Covid-19 periods, and when mobility changes drastically, a match with periods of low TB notification when compared to the same period in 2019 is found. This suggests that neither arguing about rises nor decreases in transmission seems to have enough support from data.

SUPPLEMENTARY NOTES 2
Alternative scenario by force of infection and treatment completions: In the model, the force of infection is calculated as: where β(t) is a half sigmoid calibrated by the model, M (a, a ′ , t) represents the relative contact frequency that an individual of age a has with individuals of age a ′ at time t, with respect to the overall average of contacts that an individual has per unit time with anyone else. Υ(a ′ , t), on the other hand, represents the weighted density of all the infectious individuals within age-group a ′ at time step t. All together build the force of infection, which represents the rate at which infection occurs at time step t for a susceptible individual in age-group a.
Aiming to characterize the effect of a possible modification in λ(a, t), we perform a sensitivity analysis over the TB burden forecasts of the model in the four countries of the study. Given the information presented in Supplementary  Figure 3, it is difficult to quantify a variation in the force of infection, as mobility changes suggest that individuals left public spaces and frequented more private and closed spaces. At the same time, the non-pharmaceutical interventions implemented to halt COVID-19 transmission might have had an effect also on the transmission of TB. Early evidence shows that this is not the case and that the transmission of TB has remained constant or even has increased in certain contexts 9,10 .
Thus, we perform our sensitivity analysis introducing a multiplicative factor that comprises both increases and decreases evenly distributed around the base value in the range β r ∈ [0.85, 1.15], representing variations from −15% to 15% of the base value. The new value of the force of infection is thus obtained as: Regarding the treatment completion, in the main text a fixed value of η = 0.788 multiplicative factor is adopted, as literature findings support it. Here, we also decided to explore a more conservative case in which the multiplicative factor is raised to η = 0.90, which allows more patients to be treated quickly in the model. Then, we perform our simulations and produce forecasts for every alternative scenario and present them in two panels, one for each value of the treatment completion. Supplementary Figure 4 shows the different forecasts when the force of infection is modified accordingly to the multiplicative factor β r for the same value as in the main text, while Supplementary Figure 5 presents the forecasts with the alternative treatment completion value.
In Supplementary Figure 4 the results for the sensitivity over λ(a, t) with a treatment completion scale factor of 0.788, and the results are, as expected, highly dependent upon the force of infection values. That's because the infections in the model are mainly driven through primary infections, i.e., those that occur upon susceptible individuals. Primary infections are highly dependent on λ(a, t) too, as in the model they are calculated as the sum over all age groups a of the product λ(a, t)S(a, t), thus, the fraction of susceptible individuals of age a that gets infected in time step t. Nevertheless, λ(a, t) also affects reinfections, which ultimately leads to a high reduction of the expected burden if it decreases, along with an increment of incidence and mortality if it decreases.
Moreover, in Supplementary Figure 5 the results for the sensitivity over λ(a, t) with a treatment completion scale factor of 0.90 are shown. We recover the same qualitative behavior that in the previous case, but not quantitatively, as all future burdens are lower because we are allowing more sick individuals to be treated than in the previous scenario during the pandemic period.

SUPPLEMENTARY NOTES 3
Recurrent bumps: Alternative scenario where the disruption lasts longer than expected: Given the current situation with COVID-19 variants spreading, it is not clear if the pandemic is reaching an end soon. Under those circumstances, we decided to perform a sensitivity analysis in which we explore the hypothetical scenario of having an extra disruption in diagnosis similar to the one reported in the data.
Thus, we perform a new set of simulations in which we forecast both incidence and mortality when another bump equal to the fitted ones is introduced right after the end of the former. In Indonesia, Kenya, and Pakistan, this is trivial, as there is only one big bump in diagnosis reported, whereas, in India, there were two separate bumps, a first, big one, and a second, narrow one. In this situation, we analyzed separately each case, studying the case of repeating either the first bump or the narrowest, second bump. The results of this analysis are reported in Supplementary  Figure 6, where diagnosis rates, incidence temporal series, and mortality in 2035 are reported in each scenario.     Supplementary Figure 6. Diagnosis and TB burden in alternative scenarios. The first column reports the diagnosis rate D(t), in each country, in the baseline scenario (grey dotted line) compared to the COVID-19 primary disruption and the hypothetical second disruption (blue lines). The second column reports TB incidence temporal series in each country in the same scenarios as before. If a secondary disruption is introduced, baseline incidence levels take more time to be reached. Finally, the third and last column reports the expected mortality in the year 2035 comparing the primary and primary plus secondary scenarios.
In the latter, an increase in mortality leads to even more deaths caused by the pandemic disruption of TB care.